Model Selection

Visual Scene Understanding

# Visual Scene Understanding

Distill Any Depth Small Hf

Distill-Any-Depth is a SOTA monocular depth estimation model trained based on knowledge distillation algorithms, capable of efficient and accurate depth estimation.

LLaVA-SpaceSGG is a visual question-answering model based on LLaVA-v1.5-13b, focusing on scene graph generation tasks. It can understand image content and generate structured scene descriptions.

Text-to-Image English

Dpt Dinov2 Giant Nyu

DPT model using DINOv2 as the backbone network for monocular depth estimation tasks

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase